In this lab we will learn the basics of analysis imaging data. There is a corresponding quiz on Canvas – the questions are dispersed throughout this lab (called Quiz question). There are also additional questions which you do not have to turn in.
Install packages.
pkgs_needed = c("EBImage","magrittr","tibble","ggplot2","genefilter",
"GGally", "MSMB")
letsinstall = setdiff(pkgs_needed, installed.packages())
if (length(letsinstall) > 0) {
BiocManager::install(letsinstall)
}
Load packages.
library("EBImage")
library("magrittr")
library("tibble")
library("ggplot2")
library("genefilter")
library("GGally")
knitr::opts_chunk$set(echo = TRUE)
A useful toolkit for handling images in R is the Bioconductor package EBImage. We start out by reading in a simple picture to demonstrate the basic functions.
imagefile = system.file("images", "mosquito.png",
package = "MSMB")
mosq = readImage(imagefile)
Let’s visualize the image that we just read in. The basic function is display from the EBImage package. If you do this from the R console (instead of knitting the document), then this will allow you to zoom in and scroll around.
EBImage::display(mosq)
We can also display the image using R’s built-in plotting by calling display with the argument method = "raster". The image then goes to the current device. In this way, we can combine image data with other plotting functionality, for instance, to add text labels.
EBImage::display(mosq, method = "raster")
text(x = 85, y = 800, label = "A mosquito",
adj = 0, col = "orange", cex = 1.5)
Note that we can also read and view color images.
imagefile = system.file("images", "hiv.png", package = "MSMB")
hivc = readImage(imagefile)
EBImage::display(hivc, method = "raster")
Furthermore, if an image has multiple frames, they can be displayed all at once in a grid arrangement by specifying the function argument all = TRUE,
nuc = readImage(system.file("images", "nuclei.tif", package = "EBImage"))
EBImage::display(1 - nuc, method = "raster", all = TRUE)
or we can just view a single frame, for instance, the second one.
EBImage::display(1 - nuc, method = "raster", frame = 2)
Quiz question 1: What is the smallest value in the nuc frames?
Answer:
min(nuc@.Data)
## [1] 0
Quiz question 2: What is the largest value in the nuc frames?
Answer:
max(nuc@.Data)
## [1] 1
Quiz question 3: What does 1 - nuc do?
Answer:
Inverts the nuc image
We read the image hivc from a file in PNG format, so let’s now write it out as a JPEG file. The underlying JPEG software library lets us choose a quality value between 1 and 100 as a parameter for its compression algorithm, which is exposed in the writeImage function by its argument quality. The default value is 100. Here we use a smaller value. This leads to smaller file size at the cost of some reduction in image quality.
writeImage(hivc, "hivc.jpeg", quality = 85)
Let’s dig into what’s going on by first identifying the class of the images we are plotting:
class(mosq)
## [1] "Image"
## attr(,"package")
## [1] "EBImage"
So we see that this object has the class Image. This is not one of the base R classes, rather, it is defined by the package EBImage. We can find out more about this class through the help browser or by typing ?Image. The class is derived from the base R class array, so you can do with Image objects everything that you can do with R arrays; in addition, they have some extra features and behaviors. In R’s parlance, the extra features are called slots and the behaviors are called methods; methods are a special kind of function.
Quiz question 4: How many slots does an Image object have?
Answer: 2
slotNames(hivc)
## [1] ".Data" "colormode"
Quiz question 5: What is the pixel value of mosq at position [1,1]?
Answer: 0.196
mosq[1,1]
## Image
## colorMode : Grayscale
## storage.mode : double
## dim : 1 1
## frames.total : 1
## frames.render: 1
##
## imageData(object)[1:1,1:1]
## [1] 0.1960784
The dimensions of the image can be extracted using the dim method, just like for regular arrays.
dim(mosq)
## [1] 1400 952
The hist method has been redefined:
hist(mosq)
If we want to directly access the data matrix as an R array, we can use the accessor function imageData:
imageData(mosq)[1:3, 1:6]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.1960784 0.1960784 0.1960784 0.1960784 0.1960784 0.1960784
## [2,] 0.1960784 0.1960784 0.1960784 0.1960784 0.1960784 0.1960784
## [3,] 0.1960784 0.1960784 0.2000000 0.2039216 0.2000000 0.1960784
Compare mosq and hivc.
## Image
## colorMode : Grayscale
## storage.mode : double
## dim : 1400 952
## frames.total : 1
## frames.render: 1
##
## imageData(object)[1:5,1:6]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.1960784 0.1960784 0.1960784 0.1960784 0.1960784 0.1960784
## [2,] 0.1960784 0.1960784 0.1960784 0.1960784 0.1960784 0.1960784
## [3,] 0.1960784 0.1960784 0.2000000 0.2039216 0.2000000 0.1960784
## [4,] 0.1960784 0.1960784 0.2039216 0.2078431 0.2000000 0.1960784
## [5,] 0.1960784 0.2000000 0.2117647 0.2156863 0.2000000 0.1921569
## Image
## colorMode : Color
## storage.mode : double
## dim : 1400 930 3
## frames.total : 3
## frames.render: 1
##
## imageData(object)[1:5,1:6,1]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0
## [4,] 0 0 0 0 0 0
## [5,] 0 0 0 0 0 0
Quiz question 6: How many color channels are in mosq?
Answer: 1
Quiz question 7: How many color channels are in hivc?
Answer: 3
The two images differ by their property colorMode, which is Grayscale for mosq and Color for hivc. What is the point of this property? It turns out to be convenient when we are dealing with stacks of images. If colorMode is Grayscale, then the third and all higher dimensions of the array are considered as separate image frames corresponding, for instance, to different \(z\)-positions, time points, replicates, etc. On the other hand, if colorMode is Color, then the three dimensions are assumed to hold different color channels, and only the fourth and higher dimensions – if present – are used for multiple image frames.
Quiz question 8: How many color channels are in nuc?
Answer: 1
R stores the data nuc as
nuc
## Image
## colorMode : Grayscale
## storage.mode : double
## dim : 510 510 4
## frames.total : 4
## frames.render: 4
##
## imageData(object)[1:5,1:6,1]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.06274510 0.07450980 0.07058824 0.08235294 0.10588235 0.09803922
## [2,] 0.06274510 0.05882353 0.07843137 0.09019608 0.09019608 0.10588235
## [3,] 0.06666667 0.06666667 0.08235294 0.07843137 0.09411765 0.09411765
## [4,] 0.06666667 0.06666667 0.07058824 0.08627451 0.08627451 0.09803922
## [5,] 0.05882353 0.06666667 0.07058824 0.08235294 0.09411765 0.10588235
dim(imageData(nuc))
## [1] 510 510 4
We see that we have 4 frames in total, which correspond to the 4 separate images frames.render.
Now that we know that images are stored as arrays of numbers in R, our method of manipulating images becomes clear – simple algebra! For example, we can take our original image and flip the bright areas to dark and vice versa by simply subtracting the image from its maximum value, 1.
mosqinv = 1 - mosq
EBImage::display(mosqinv, method = "raster")
Here we use the fact that we know the minimum and maximum values of the image to be 0 and 1. In the more general case, the expression normalize(-mosq) would be safer.
EBImage::display(normalize(-mosq), method = "raster")
We could also adjust the contrast through multiplication
mosqcont = mosq * 3
EBImage::display(mosqcont, method = "raster")
and the gamma-factor through exponentiation.
mosqexp = mosq ^ (1/3)
EBImage::display(mosqexp, method = "raster")
Furthermore, we can crop,
mosqcrop = mosq[100:438, 112:550]
EBImage::display(mosqcrop, method = "raster")
threshold
mosqthresh = mosq > 0.5
EBImage::display(mosqthresh, method = "raster")
and transpose images with matrix operations
mosqtransp = transpose(mosq)
EBImage::display(mosqtransp, method = "raster")
Quiz question 9: What is the object class of mosqthresh, the result of the thresholding?
Answer: EBImage
str(mosqthresh)
## Formal class 'Image' [package "EBImage"] with 2 slots
## ..@ .Data : logi [1:1400, 1:952] FALSE FALSE FALSE FALSE FALSE FALSE ...
## ..@ colormode: int 0
We just saw one type of spatial transformation, transposition, but there are many more – here are some examples:
mosqrot = EBImage::rotate(mosq, angle = 30)
mosqshift = translate(mosq, v = c(40, 70))
mosqflip = flip(mosq)
mosqflop = flop(mosq)
rotate rotates the image clockwise with the given angle, translate moves the image by the specified two-dimensional vector (pixels that end up outside the image region are cropped, and pixels that enter into the image region are set to zero). The functions flip and flop reflect the image around the central horizontal and vertical axis, respectively.
Plot them:
# TODO
Let’s now switch to an application in cell biology. We load images of human cancer cells that were studied by Laufer, Fischer and co-workers (Laufer et al. 2013).
imagefiles = system.file("images",
c("image-DAPI.tif", "image-FITC.tif", "image-Cy3.tif"),
package="MSMB")
cells = readImage(imagefiles)
cells
Our goal now is to computationally identify and quantitatively characterize the cells in these images. However, before we can start with real work, we need to deal with a slightly mundane data conversion issue. This is, of course, not unusual. Let us inspect the dynamic range (the minimum and the maximum value) of the images.
apply(cells, 3, range)
## image-DAPI image-FITC image-Cy3
## [1,] 0.001586938 0.002899214 0.001663233
## [2,] 0.031204700 0.062485695 0.055710689
We see that the maximum values are small numbers well below 1. We can rescale these data to approximately cover the range \([0,1]\) as
cells[,,1] = 32 * cells[,,1]
cells[,,2:3] = 16 * cells[,,2:3]
apply(cells, 3, range)
## image-DAPI image-FITC image-Cy3
## [1,] 0.05078202 0.04638743 0.02661173
## [2,] 0.99855039 0.99977111 0.89137102
Quiz question 10: How big is the cells object in R’s memory in Mb? Hint: Use object.size and format functions.
Answer:
format(object.size(cells), units="Mb")
## [1] "3.8 Mb"
Our first goal is segmentation of the images to identify the individual cells. We can start by removing local artifacts or noise from the images through smoothing. We use a Gaussian kernel to compute a weighed average intensity at all pixel positions. In EBImage, this can be is implemented by the function makeBrush and filter2.
w = makeBrush(size = 51, shape = "gaussian", sigma = 7)
tibble(w = w[(nrow(w)+1)/2, ]) %>%
ggplot(aes(y = w, x = seq(along = w))) + geom_point()
nucSmooth = filter2(getFrame(cells, 1), w)
EBImage::display(nucSmooth, method = "raster")
Quiz question 11: What happens if you decrease the brush size?
Answer: Image is less blurred
To proceed, we now, however, use smaller smoothing bandwidths than what we displayed. Let’s use a sigma of 1 pixel for the DNA channel and 3 pixels for actin and tubulin.
cellsSmooth = Image(dim = dim(cells))
sigma = c(1, 3, 3)
for(i in seq_along(sigma)) {
cellsSmooth[, ,i] = filter2(
cells[,,i],
filter = makeBrush(size = 51, shape = "gaussian",
sigma = sigma[i])
)
}
EBImage::display(cellsSmooth, method = "raster", all = TRUE)
We now define a smoothing window, disc, whose size is 21 pixels, and therefore bigger than the nuclei we want to detect, but small compared to the length scales of the illumination artifact. We use it to threshold the smoothed image.
disc = makeBrush(21, "disc")
disc = disc / sum(disc)
offset = 0.02
nucThresh = (cellsSmooth[,,1] - filter2( cellsSmooth[,,1], disc ) > offset)
EBImage::display(nucThresh, method = "raster")
The thresholded image nucThresh is not yet satisfactory. The boundaries of the nuclei are slightly rugged, and there is noise at the single-pixel level. An effective and simple way to remove these nuisances is given by a set of morphological operations
Let us apply a morphological opening to our image.
nucOpened = EBImage::opening(nucThresh, kern = makeBrush(3, shape = "disc"))
EBImage::display(nucOpened, method = "raster")
The result of this is subtle, and you will have to zoom into the images in to spot the differences, but this operation manages to smoothen out some pixel-level features in the binary images that for our application are undesirable.
The binary image nucOpened represents a segmentation of the image into foreground and background pixels, but not into individual nuclei. We can take one step further and extract individual objects defined as connected sets of pixels. In EBImage, there is a handy function for this purpose, bwlabel.
nucSeed = bwlabel(nucOpened)
The function returns an image, nucSeed, of integer values, where 0 represents the background, and the integers equal and greater to 1 are indices of different identified objects.
Quiz question 12: What is the index of the second largest cell in nucSeed? Hint: Use table function.
Answer: 1
sort(table(nucSeed), decreasing = TRUE)
## nucSeed
## 0 10 1 34 4 32 33 42 26 27 38
## 155250 537 521 498 468 443 413 394 380 380 376
## 23 2 41 31 43 20 17 24 18 35 37
## 345 342 342 321 316 309 304 287 285 256 231
## 40 5 30 39 28 25 21 16 12 15 13
## 231 222 222 214 209 203 195 189 184 184 180
## 36 19 8 22 7 6 3 9 11 14 29
## 169 167 159 148 125 122 120 117 117 116 9
We could use this information to remove objects that are too large or too small compared to what we expect.
To visualize such images, the function colorLabels is convenient, which converts the (grayscale) integer image into a color image, using distinct, arbitrarily chosen colors for each object.
EBImage::display(colorLabels(nucSeed), method = "raster")
The result is already encouraging, although we can spot two types of errors:
For statistical analyses of high-throughput images, we may choose to be satisfied with a simple method that does not rely on too many parameters or assumptions and results in a perhaps sub-optimal but rapid and good enough result. In this spirit, let us proceed with what we have. We generate a lenient foreground mask, which surely covers all nuclear stained regions, even though it also covers some regions between nuclei. To do so, we simply apply a second, less stringent adaptive thresholding.
nucMask = cellsSmooth[,, 1] - filter2(cellsSmooth[ , , 1], disc) > 0
and apply another morphological operation, fillHull, which fills holes that are surrounded by foreground pixels.
nucMask = fillHull(nucMask)
To improve nucSeed, we can nowpropagateits segmented objects until they fill the mask defined bynucMask. Boundaries between nuclei, in those places where the mask is connected, can be drawn by Voronoi tessellation, which is implemented in the functionpropagate`.
nuclei = propagate(cellsSmooth[,,1], nucSeed, mask = nucMask)
EBImage::display(nuclei,method = "raster")
To determine a mask of cytoplasmic area in the images, let us explore a different way of thresholding, this time using a global threshold which we find by fitting a mixture model to the data. The histograms show the distributions of the pixel intensities in the actin image. We look at the data on the logarithmic scale, and zoom into the region where most of the data lie.
hist( log(cellsSmooth[,,3]) )
hist( log(cellsSmooth[,,3]), xlim = -c(3.6, 3.1), breaks = 300)
Looking at the these histograms for many images, we can set up the following model for the purpose of segmentation: the signal in the cytoplasmic channels of cellsSmooth is a mixture of two distributions, a log-Normal background and a foreground with another, unspecified, rather flat, but mostly non-overlapping distribution. Moreover the majority of pixels are from the background.
Code for estimating mixture parameters:
library("genefilter")
bgPars = function(x) {
x = log(x)
loc = half.range.mode( x )
left = (x - loc)[ x < loc ]
wid = sqrt( mean(left^2) )
c(loc = loc, wid = wid, thr = loc + 6*wid)
}
cellBg = apply(cellsSmooth, MARGIN = 3, FUN = bgPars)
cellBg
## [,1] [,2] [,3]
## loc -2.90176965 -2.94427499 -3.52191681
## wid 0.00635322 0.01121337 0.01528207
## thr -2.86365033 -2.87699477 -3.43022437
Visualizing of estimated parameters:
hist(log(cellsSmooth[,, 3]), xlim = -c(3.6, 3.1), breaks = 300)
abline(v = cellBg[c("loc", "thr"), 3], col = c("brown", "red"))
Thresholding to obtain cell bodies.
cytoplasmMask = (cellsSmooth[,,2] > exp(cellBg["thr", 2])) |
nuclei | (cellsSmooth[,,3] > exp(cellBg["thr", 3]))
EBImage::display(cytoplasmMask, method = "raster")
Dividing into cells:
cellbodies = propagate(x = cellsSmooth[ , , 3], seeds = nuclei,
lambda = 1.0e-2, mask = cytoplasmMask)
EBImage::display(colorLabels(cellbodies), method = "raster")
Overlay nuclei segmentation on first channel:
nucSegOnNuc = paintObjects(nuclei, tgt = toRGB(cells[, , 1]), col = "#ffff00")
EBImage::display(nucSegOnNuc, method = "raster")
Overlay nuclei segmentation with all three channels:
cellsColor = rgbImage(red = cells[,, 3],
green = cells[,, 2],
blue = cells[,, 1])
nucSegOnAll = paintObjects(nuclei, tgt = cellsColor, col = "#ffff00")
EBImage::display(nucSegOnAll, method = "raster")
Overlay segmented cell bodies.
cellSegOnAll = paintObjects(cellbodies, tgt = nucSegOnAll, col = "#ff0080")
EBImage::display(cellSegOnAll, method = "raster")
Now that we have the segmentations nuclei and cellbodies together with the original image data cells, we can compute various descriptors, or features, for each cell. We use the base R function table to determine the total number and sizes of the objects.
Let us now take this further and compute the mean intensity of the DAPI signal(cells[,, 1]) in the segmented nuclei, the mean actin intensity (cells[,, 3]) in the segmented nuclei and the mean actin intensity in the cell bodies.
meanNucInt = tapply(cells[,,1], nuclei, mean)
meanActIntInNuc = tapply(cells[,,3], nuclei, mean)
meanActIntInCell = tapply(cells[,,3], cellbodies, mean)
We can visualize the features in pairwise scatterplots. We see that they are correlated with each other, although each feature also carries independent information.
library("GGally")
ggpairs(tibble(meanNucInt, meanActIntInNuc, meanActIntInCell))
In fact, EBImage provides a convenience function computeFeatures that automatically and efficiently computes many commonly used features. Below, we compute features for intensity, shape and texture for each cell from the DAPI channel using the nucleus segmentation (nuclei) and from the actin and tubulin channels using the cell body segmentation (cytoplasmRegions).
F1 = computeFeatures(nuclei, cells[,,1], xname = "nuc",
refnames = "nuc")
F2 = computeFeatures(cellbodies, cells[,,2], xname = "cell",
refnames = "tub")
F3 = computeFeatures(cellbodies, cells[,,3], xname = "cell",
refnames = "act")
dim(F1)
## [1] 43 89
F1 is a matrix with 43 rows (one for each cell) and 89 columns, one for each of the computed features.
F1[1:3, 1:5]
## nuc.0.m.cx nuc.0.m.cy nuc.0.m.majoraxis nuc.0.m.eccentricity nuc.0.m.theta
## 1 119.5523 17.46895 44.86819 0.8372059 -1.314789
## 2 143.4511 15.83709 26.15009 0.6627672 -1.213444
## 3 336.5401 11.48175 18.97424 0.8564444 1.470913
The column names encode the type of feature, as well the color channel(s) and segmentation mask on which it was computed. For example, we could now use multivariate analysis methods to draw inferences about our dataset.